1 Methodology

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## āœ” dplyr     1.1.4     āœ” readr     2.1.5
## āœ” forcats   1.0.0     āœ” stringr   1.5.1
## āœ” ggplot2   3.5.1     āœ” tibble    3.2.1
## āœ” lubridate 1.9.3     āœ” tidyr     1.3.1
## āœ” purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## āœ– dplyr::filter() masks stats::filter()
## āœ– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggrepel)
library(edgeR)
## Loading required package: limma
library(limma)
library(biomaRt)
library(fgsea)

1.1 Loading the dataset and obtaining the model.

reads=read.table("Analyses/Inputs/Processed_datasets/D1/Start_Practice4/reads.txt")
metadata=read.table("Analyses/Inputs/Processed_datasets/D1/Start_Practice4/metadata.txt")
feature_data=read.table("Analyses/Inputs/Processed_datasets/D1/Start_Practice4/featuredata.txt")

I used a interaction model in which batch (library date), population and condition terms are included as well as the interaction between condition and population. The interaction will give the information to work with since the response expression differences among populations (up/down regulated for a given population in response to each tratment) for each treatment will reveal the differences in response among these two groups. Thresholds chosen for logFC and Fdr are 0.05 since I didn’t want to be restrictive in the selection of differentially expressed genes for both groups will be severely altered if higher thresholds of logFC are proposed and will lose significance if false discovery rate cut was lower than 0.05.

metadata$condition=factor(metadata$condition,levels=unique(metadata$condition))
metadata$population=factor(metadata$population,levels=c("EUB","AFB"))
dge <- DGEList(counts = reads)
dge <- calcNormFactors(dge)
design=model.matrix(~condition+library_date+population+condition*population,data=metadata)
v=voom(dge,design,plot=TRUE)

fit <-lmFit(v,design)
fit <- eBayes(fit)
colnames(fit)
##  [1] "(Intercept)"                     "conditionPAM3CSK4"              
##  [3] "conditionR848"                   "conditionLPS"                   
##  [5] "conditionIAV"                    "library_date1/22/14 17:19"      
##  [7] "library_date1/28/14 11:07"       "library_date1/9/14 16:03"       
##  [9] "library_date2/11/14 10:14"       "library_date2/11/14 18:08"      
## [11] "library_date2/13/14 8:57"        "library_date2/13/14 9:39"       
## [13] "library_date2/6/14 14:18"        "library_date2/6/14 14:20"       
## [15] "library_date3/11/14 15:48"       "library_date3/19/14 17:59"      
## [17] "library_date3/19/14 18:00"       "library_date3/21/14 14:13"      
## [19] "library_date3/31/14 16:20"       "library_date3/6/14 18:41"       
## [21] "library_date3/7/14 13:26"        "library_date3/7/14 9:07"        
## [23] "library_date4/10/14 16:00"       "library_date4/14/14 17:43"      
## [25] "library_date4/3/14 14:22"        "library_date4/7/14 17:43"       
## [27] "library_date4/7/14 17:45"        "library_dateunknown"            
## [29] "populationAFB"                   "conditionPAM3CSK4:populationAFB"
## [31] "conditionR848:populationAFB"     "conditionLPS:populationAFB"     
## [33] "conditionIAV:populationAFB"
betas=data.frame(fit$coefficients)
ts=data.frame(fit$t)
ps=data.frame(fit$p.value)
BH=data.frame(ps)

dif_AF_PAM=data.frame(logFC=betas$conditionPAM3CSK4.populationAFB, t=ts$conditionPAM3CSK4.populationAFB, Fdr=BH$conditionPAM3CSK4.populationAFB)
rownames(dif_AF_PAM)<-rownames(betas[betas$conditionPAM3CSK4.populationAFB==TRUE])

dif_AF_R848=data.frame(logFC=betas$conditionR848.populationAFB, t=ts$conditionR848.populationAFB, Fdr=BH$conditionR848.populationAFB)
rownames(dif_AF_R848)<-rownames(betas[betas$conditionR848.populationAFB==TRUE])
                       
dif_AF_LPS=data.frame(logFC=betas$conditionLPS.populationAFB,t=ts$conditionLPS.populationAFB,Fdr=BH$conditionLPS.populationAFB)
rownames(dif_AF_LPS)<-rownames(betas[betas$conditionLPS.populationAFB==TRUE])

dif_AF_IAV=data.frame(logFC=betas$conditionIAV.populationAFB,t=ts$conditionIAV.populationAFB,Fdr=BH$conditionIAV.populationAF)
rownames(dif_AF_IAV)<-rownames(betas[betas$conditionIAV.populationAFB==TRUE])
threshold=0.05
th_FC=0.05

dif_AF_PAM$DE="No"
dif_AF_PAM$DE[which(dif_AF_PAM$Fdr<threshold & abs(dif_AF_PAM$logFC)>th_FC)]="Yes"
hits_PAM=length(which(dif_AF_PAM$Fdr<threshold & abs(dif_AF_PAM$logFC)>th_FC))
volcano_PAM=ggplot(dif_AF_PAM)+geom_point(aes(x=logFC,y=-log10(Fdr),color=DE),size=0.5)+
  scale_colour_manual(values = c("grey20", "firebrick4"))+ggtitle(paste0("AF effects in PAM response:  ",hits_PAM, " DEGs"))+
  geom_hline(yintercept=-log10(threshold)) + 
  geom_vline(xintercept=c(th_FC,-th_FC))

dif_AF_R848$DE="No"
dif_AF_R848$DE[which(dif_AF_R848$Fdr<threshold & abs(dif_AF_R848$logFC)>th_FC)]="Yes"
hits_R848=length(which(dif_AF_R848$Fdr<threshold & abs(dif_AF_R848$logFC)>th_FC))
volcano_R848=ggplot(dif_AF_R848)+geom_point(aes(x=logFC,y=-log10(Fdr),color=DE),size=0.5)+
  scale_colour_manual(values = c("grey20", "firebrick4"))+ggtitle(paste0("AF effects in R848 response:  ",hits_R848, " DEGs"))+
  geom_hline(yintercept=-log10(threshold)) +     
  geom_vline(xintercept=c(th_FC,-th_FC))

dif_AF_LPS$DE="No"
dif_AF_LPS$DE[which(dif_AF_LPS$Fdr<threshold & abs(dif_AF_LPS$logFC)>th_FC)]="Yes"
hits_LPS=length(which(dif_AF_LPS$Fdr<threshold & abs(dif_AF_LPS$logFC)>th_FC))
volcano_LPS=ggplot(dif_AF_LPS)+geom_point(aes(x=logFC,y=-log10(Fdr),color=DE),size=0.5)+
  scale_colour_manual(values = c("grey20", "firebrick4"))+ggtitle(paste0("AF effects in LPS response:  ",hits_LPS, " DEGs"))+
  geom_hline(yintercept=-log10(threshold)) + 
  geom_vline(xintercept=c(th_FC,-th_FC))

dif_AF_IAV$DE="No"
dif_AF_IAV$DE[which(dif_AF_IAV$Fdr<threshold & abs(dif_AF_IAV$logFC)>th_FC)]="Yes"
hits_IAV=length(which(dif_AF_IAV$Fdr<threshold & abs(dif_AF_IAV$logFC)>th_FC))
volcano_IAV=ggplot(dif_AF_IAV)+geom_point(aes(x=logFC,y=-log10(Fdr),color=DE),size=0.5)+
  scale_colour_manual(values = c("grey20", "firebrick4"))+ggtitle(paste0("AF effects in IAV response:  ",hits_IAV, " DEGs"))+
  geom_hline(yintercept=-log10(threshold)) + 
  geom_vline(xintercept=c(th_FC,-th_FC))
gridExtra::grid.arrange(grobs=list(volcano_PAM, volcano_R848, volcano_LPS, volcano_IAV), nrows=2)

  • Finally, the chosen genes for the enrichment analysis will be those that surpass the thresholds selected for differential expression
d_AF_PAM <- data.frame(dif_AF_PAM[dif_AF_PAM$DE=="Yes",])
d_AF_R84 <- data.frame(dif_AF_R848[dif_AF_R848$DE=="Yes",] )
d_AF_LPS <- data.frame(dif_AF_LPS[dif_AF_LPS$DE=="Yes",])
d_AF_IAV <- data.frame(dif_AF_IAV[dif_AF_IAV$DE=="Yes",])

1.2 Defining the clusters of genes we’re going to analyze.

I defined a number of 6 clusters to obtain up to 3 levels of expression in each up and down regulated genes for each treatment. I’ll be obtaining groups of differential expression in terms of logFC for further analysis. Results are ploted in a volcano plot where only first selected differentially expressed genes are plotted and divided in groups using different colors for each cluster.

clusters_PAM_AF <- hclust(dist(d_AF_PAM$logFC))
clusters_R84_AF <- hclust(dist(d_AF_R84$logFC))
clusters_LPS_AF <- hclust(dist(d_AF_LPS$logFC))
clusters_IAV_AF <- hclust(dist(d_AF_IAV$logFC))
clusters_PAM_AF_cut <- cutree(clusters_PAM_AF,k=6)
clusters_R84_AF_cut <- cutree(clusters_R84_AF,k=6)
clusters_LPS_AF_cut <- cutree(clusters_LPS_AF,k=6)
clusters_IAV_AF_cut <- cutree(clusters_IAV_AF,k=6)

d_AF_PAM$clusters = as.factor(clusters_PAM_AF_cut)
d_AF_R84$clusters = as.factor(clusters_R84_AF_cut)
d_AF_LPS$clusters = as.factor(clusters_LPS_AF_cut)
d_AF_IAV$clusters = as.factor(clusters_IAV_AF_cut)
v_PAM = ggplot(d_AF_PAM)+geom_point(aes(x=logFC,y=-log(Fdr),col=clusters)) +
  scale_color_manual(values=c("#DC143C", "#708090", "#A0522D", "#CD5C5C", "#696969", "#BC8F8F")) +
  ggtitle('PAM3CSK4 treatment response for AF population interaction')

v_R848 = ggplot(d_AF_R84)+geom_point(aes(x=logFC,y=-log(Fdr),col=clusters)) +
  scale_color_manual(values=c("#DC143C", "#708090", "#A0522D", "#CD5C5C", "#696969", "#BC8F8F")) +
  ggtitle('R848 treatment response for AF population interaction')
  
v_LPS = ggplot(d_AF_LPS)+geom_point(aes(x=logFC,y=-log(Fdr),col=clusters)) +
  scale_color_manual(values=c("#DC143C", "#708090", "#A0522D", "#CD5C5C", "#696969", "#BC8F8F")) +
  ggtitle('LPS treatment response for AF population interaction')

v_IAV = ggplot(d_AF_IAV)+geom_point(aes(x=logFC,y=-log(Fdr),col=clusters)) +
  scale_color_manual(values=c("#DC143C", "#708090", "#A0522D", "#CD5C5C", "#696969", "#BC8F8F")) +
  ggtitle('IAV treatment response for AF population interaction')
gridExtra::grid.arrange(grobs=list(v_PAM, v_R848, v_LPS, v_IAV), ncol=2)

1.3 Gene ontology enrichment

library(clusterProfiler)
## 
## clusterProfiler v4.10.1  For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
## 
## If you use clusterProfiler in published research, please cite:
## T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141
## 
## Attaching package: 'clusterProfiler'
## The following object is masked from 'package:biomaRt':
## 
##     select
## The following object is masked from 'package:purrr':
## 
##     simplify
## The following object is masked from 'package:stats':
## 
##     filter
library(org.Hs.eg.db)
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following object is masked from 'package:limma':
## 
##     plotMA
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## The following objects are masked from 'package:dplyr':
## 
##     combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:clusterProfiler':
## 
##     rename
## The following objects are masked from 'package:lubridate':
## 
##     second, second<-
## The following objects are masked from 'package:dplyr':
## 
##     first, rename
## The following object is masked from 'package:tidyr':
## 
##     expand
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## 
## Attaching package: 'IRanges'
## The following object is masked from 'package:clusterProfiler':
## 
##     slice
## The following object is masked from 'package:lubridate':
## 
##     %within%
## The following objects are masked from 'package:dplyr':
## 
##     collapse, desc, slice
## The following object is masked from 'package:purrr':
## 
##     reduce
## 
## Attaching package: 'AnnotationDbi'
## The following object is masked from 'package:clusterProfiler':
## 
##     select
## The following object is masked from 'package:dplyr':
## 
##     select
## 
library(AnnotationDbi)

A gene ontology enrichment is performed for each cluster in order to know the prevalent functions gene clusters are involved in. The threshold for maximal p-value used in this method is 0.05 in order to obtain as much information as possible without compromising significance. I created a function to reduce the amount of code needed to save results in the corresponding data frame.

GO_enrich <- function(cluster,d_cond_data.frame,dif_cond_data.frame){
  
  rownames(d_cond_data.frame)=rownames(dif_cond_data.frame[dif_cond_data.frame$DE=="Yes",])
  
  enrichGO(
                gene          = rownames(d_cond_data.frame)[which(d_cond_data.frame$cluster==cluster)],
                universe      = rownames(d_cond_data.frame),
                keyType       = 'ENSEMBL',
                OrgDb         = org.Hs.eg.db,
                ont           = "BP",
                pAdjustMethod = "BH",
                pvalueCutoff = 0.05)
}
GO_1_PAM <- GO_enrich(1,d_AF_PAM, dif_AF_PAM)
GO_2_PAM <- GO_enrich(2,d_AF_PAM, dif_AF_PAM)
GO_3_PAM <- GO_enrich(3,d_AF_PAM, dif_AF_PAM)
GO_4_PAM <- GO_enrich(4,d_AF_PAM, dif_AF_PAM)
GO_5_PAM <- GO_enrich(5,d_AF_PAM, dif_AF_PAM)
GO_6_PAM <- GO_enrich(6,d_AF_PAM, dif_AF_PAM)
GO_1_R848 <- GO_enrich(1,d_AF_R84, dif_AF_R848)
GO_2_R848 <- GO_enrich(2,d_AF_R84, dif_AF_R848)
GO_3_R848 <- GO_enrich(3,d_AF_R84, dif_AF_R848)
GO_4_R848 <- GO_enrich(4,d_AF_R84, dif_AF_R848)
GO_5_R848 <- GO_enrich(5,d_AF_R84, dif_AF_R848)
GO_6_R848 <- GO_enrich(6,d_AF_R84, dif_AF_R848)
GO_1_LPS <- GO_enrich(1,d_AF_LPS, dif_AF_LPS)
GO_2_LPS <- GO_enrich(2,d_AF_LPS, dif_AF_LPS)
GO_3_LPS <- GO_enrich(3,d_AF_LPS, dif_AF_LPS)
GO_4_LPS <- GO_enrich(4,d_AF_LPS, dif_AF_LPS)
GO_5_LPS <- GO_enrich(5,d_AF_LPS, dif_AF_LPS)
GO_6_LPS <- GO_enrich(6,d_AF_LPS, dif_AF_LPS)
GO_1_IAV <- GO_enrich(1,d_AF_IAV, dif_AF_IAV)
GO_2_IAV <- GO_enrich(2,d_AF_IAV, dif_AF_IAV)
GO_3_IAV <- GO_enrich(3,d_AF_IAV, dif_AF_IAV)
GO_4_IAV <- GO_enrich(4,d_AF_IAV, dif_AF_IAV)
GO_5_IAV <- GO_enrich(5,d_AF_IAV, dif_AF_IAV)
GO_6_IAV <- GO_enrich(6,d_AF_IAV, dif_AF_IAV)

1.3.1 Anlyzing results from GO enrichment.

To visualize results I retrieved the lists of those clusters enrichment in which the function of the genes in the cluster is clear and those with no significant GO enrichment. I will use ā€˜goplot’ tool to visualize clearly the enrichment in the rest of the clusters and arrive to a conclusion based on p-values and the relation between terms (nodes in the graph).

library("enrichplot")
library("ggnewscale")

1.3.1.1 For Pam3CSK4 treatment (activation of toll-like receptor 1).

#GO_1_PAM$Description
#GO_2_PAM$Description
#GO_3_PAM$Description
#GO_4_PAM$Description
#GO_5_PAM$Description
#GO_6_PAM$Description
goplot(GO_1_PAM, showCategory = 10)

For the first cluster selected, the differences in gene expression between populations in response to Pam3CSK4 treatment is related to RNA processing-ribosome biogenesis and translation/peptide synthesis, related to every three main categories of gene ontology BP, CC and MF (biological processes, cellular components and molecular function).

goplot(GO_2_PAM, showCategory = 10)

For the second cluster selected, the differences in gene expression between populations in response to Pam3CSK4 treatment is related to **positive regulation of calcium ion transport* -> lipogenic gene expression -> endocytosis and membrane transport related terms and *cell morphogenesis** all related to one of main categories of gene ontology, BP (biological processes).

goplot(GO_3_PAM, showCategory = 10)

For the third cluster selected, the differences in gene expression between populations in response to Pam3CSK4 treatment is related to immune response and defense/response to other organisms.

goplot(GO_4_PAM, showCategory = 10)

For the fourth cluster selected, the differences in gene expression between populations in response to Pam3CSK4 treatment is related to cell migration, regulation of blood coagulation and cell proliferation.

goplot(GO_5_PAM, showCategory = 4)
## Warning: ggrepel: 16 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

For the fourth cluster selected, the differences in gene expression between populations in response to Pam3CSK4 treatment is related to protein modification:peptidyl-tyrosine phosphorylation and myeloid cell differentiation.

goplot(GO_6_PAM, showCategory = 10)

In the cluster 6 for the Pam3CSK4 condition we encounter response related to the positive regulation of inflammatory response and blood circulation.

1.3.1.2 For R848 treatment (TLR-7/8 agonist)

#GO_1_R848$Description
#GO_2_R848$Description
#GO_3_R848$Description
GO_4_R848$Description
## character(0)
GO_5_R848$Description
## character(0)
GO_6_R848$Description
## character(0)

For cluster 4, 5 and 6 in R848 treatment interaction term we don’t find any significant ontology terms.

goplot(GO_1_R848, showCategory = 5)

For the first cluster of R848 condition interaction term we find differences in response to treatment related to the expression of genes which regulate lipid metabolic processes and specially catabolic processes.

goplot(GO_2_R848, showCategory = 5)

For the second cluster of R848 condition interaction term we find differences in response to treatment related to the expression of genes which regulate translation.

goplot(GO_3_R848, showCategory = 10)

For the first cluster of R848 we find differences in response to treatment related to the expression of genes which regulate a large number of processes involved in immune response and response to other organisms.

1.3.1.3 For LPS response (TLR-4 agonist)

#GO_1_LPS$Description
#GO_2_LPS$Description
#GO_3_LPS$Description
#GO_4_LPS$Description
GO_5_LPS$Description
## [1] "acute inflammatory response"
#GO_6_LPS$Description

For cluster 5 we find a clear term for gene ontology differentially expressed for populations in response to the treatment related to acute inflammatory response.

goplot(GO_1_LPS, showCategory = 5)

For the first cluster the genes that are differentially expressed for each population in response to the treatment are those involved in ribosomal RNA and non-coding RNA metabolic processes.

goplot(GO_2_LPS, showCategory = 5)

For the second cluster, the genes that are differentially expressed for each population in response to the treatment are those involved in cytokine production and regulation of the immune response.

goplot(GO_3_LPS, showCategory = 5)

For the third cluster, the genes that are differentially expressed for each population in response to the treatment are those involved in response to viruses and other organisms and biotic stimulus.

goplot(GO_4_LPS, showCategory = 5)

For the fourth cluster, the genes that are differentially expressed for each population in response to the treatment are those involved in chemokine-mediated signalling pathway and humoral immune response

goplot(GO_6_LPS, showCategory = 5)

For the sixth cluster, the genes that are differentially expressed for each population in response to the treatment are those involved in platelet activation, blood coagulation and cellular extravasation also in the adhesion of leukocytes into endotelial cells and megacariocytes differentiation.

1.3.1.4 For IAV (influenza A virus) treatment

#GO_1_IAV$Description
#GO_2_IAV$Description
#GO_3_IAV$Description
#GO_4_IAV$Description
#GO_5_IAV$Description
GO_6_IAV$Description
## [1] "G protein-coupled receptor signaling pathway"

For the sixth cluster, the genes that are differentially expressed for each population in response to the treatment are those involved in G protein-coupled receptor signaling pathway.

goplot(GO_1_IAV, showCategory = 5)

For the first cluster, the genes that are differentially expressed for each population in response to the treatment are those involved in ubiquitin-dependent protein catabolic pathway, protein postraslational modification and endomembrane system organization.

goplot(GO_2_IAV, showCategory = 5)

For the second cluster, the genes that are differentially expressed for each population in response to the treatment are those involved in translation and mitochondrial ATP synthesis coupled electron transport.

goplot(GO_3_IAV, showCategory = 5)

For the third cluster, the genes that are differentially expressed for each population in response to the treatment are those involved in cellular response to cytokine stimulus, immune response and response to other organisms.

goplot(GO_4_IAV, showCategory = 5)

For the third cluster, the genes that are differentially expressed for each population in response to the treatment are those involved in interleukin-6 production (related to cytokine production) and humoral immune response.

goplot(GO_5_IAV, showCategory = 5)

For the third cluster, the genes that are differentially expressed for each population in response to the treatment are those involved in plasma lipoprotein assembly, cell activation and regulation of immune system process.

2 Discussion.

First, I’ll be adding a column with GO terms conclusions in the gene expression data set to plot the figure panel:

d_AF_PAM$terms[d_AF_PAM$clusters==1] <- "RNA processing, ribosome biogenesis and translation/peptide synthesis"
d_AF_PAM$terms[d_AF_PAM$clusters==2] <- "positive regulation of calcium ion transport -> lipogenic gene expression -> endocytosis and cell morphogenesis"
d_AF_PAM$terms[d_AF_PAM$clusters==3] <- "immune response and defense/response to other organisms"
d_AF_PAM$terms[d_AF_PAM$clusters==4] <- "cell migration, regulation of blood coagulation and cell proliferation"
d_AF_PAM$terms[d_AF_PAM$clusters==5] <- "protein modification: peptidyl-tyrosine phosphorylation and myeloid cell differentiation"
d_AF_PAM$terms[d_AF_PAM$clusters==6] <- "inflamatory response and blood circulation"
d_AF_R84$terms[d_AF_R84$clusters==1] <- "lipid metabolic processes (catabolic more significantly)"
d_AF_R84$terms[d_AF_R84$clusters==2] <- "translation"
d_AF_R84$terms[d_AF_R84$clusters==3] <- "immune response and response to other organisms"
d_AF_R84$terms[d_AF_R84$clusters==4] <- "no significant GO enrichment terms"
d_AF_R84$terms[d_AF_R84$clusters==5] <- "no significant GO enrichment terms"
d_AF_R84$terms[d_AF_R84$clusters==6] <- "no significant GO enrichment terms"
d_AF_LPS$terms[d_AF_LPS$clusters==1] <- "ribosomal RNA and non-coding RNA metabolic processes"
d_AF_LPS$terms[d_AF_LPS$clusters==2] <- "cytokine production and regulation of the immune response"
d_AF_LPS$terms[d_AF_LPS$clusters==3] <- "response to viruses and other organisms and biotic stimulus"
d_AF_LPS$terms[d_AF_LPS$clusters==4] <- "chemokine-mediated signalling pathway and humoral immune response"
d_AF_LPS$terms[d_AF_LPS$clusters==5] <- "acute inflammatory response"
d_AF_LPS$terms[d_AF_LPS$clusters==6] <- "platelet activation, blood coagulation and cellular extravasation, adhesion of leukocytes into endotelial cells and megacariocytes diff."
d_AF_IAV$terms[d_AF_IAV$clusters==1] <- "ubiquitin-dependent protein catabolic pathway, protein postraslational modification and endomembrane system organization"
d_AF_IAV$terms[d_AF_IAV$clusters==2] <- "translation and mitochondrial ATP synthesis coupled electron transport"
d_AF_IAV$terms[d_AF_IAV$clusters==3] <- "cellular response to cytokine stimulus, immune response and response to other organisms"
d_AF_IAV$terms[d_AF_IAV$clusters==4] <- "interleukin-6 production (related to cytokine production) and humoral immune response"
d_AF_IAV$terms[d_AF_IAV$clusters==5] <- "plasma lipoprotein assembly, cell activation and regulation of immune system process"
d_AF_IAV$terms[d_AF_IAV$clusters==6] <- "G protein-coupled receptor signaling pathway"
v_PAM_t = ggplot(d_AF_PAM) + geom_point(aes(x=logFC,y=-log(Fdr),col=terms), size=2) +
  scale_color_manual(values=c("#696969", "#BC8F8F", "#DC143C", "#A0522D", "#CD5C5C", "#3B83BD")) + 
  ggtitle('PAM3CSK4 treatment response differences for each population') +
  theme(legend.text = element_text(size = 14), plot.title = element_text(size=15))

v_R848_t = ggplot(d_AF_R84)+geom_point(aes(x=logFC,y=-log(Fdr),col=terms), size=2) +
  scale_color_manual(values=c("#696969", "#BC8F8F", "#DC143C", "#A0522D", "#CD5C5C", "#3B83BD")) +
  ggtitle('R848 treatment response differences for each population') +
  theme(legend.text = element_text(size = 14),  plot.title = element_text(size=15))
  
v_LPS_t = ggplot(d_AF_LPS)+geom_point(aes(x=logFC,y=-log(Fdr),col=terms), size=2) +
  scale_color_manual(values=c("#696969", "#BC8F8F", "#DC143C", "#A0522D", "#CD5C5C", "#3B83BD")) +
  ggtitle('LPS treatment response differences for eachpopulation') +
  theme(legend.text = element_text(size = 13),  plot.title = element_text(size=15))

v_IAV_t = ggplot(d_AF_IAV)+geom_point(aes(x=logFC,y=-log(Fdr),col=terms), size=2) +
  scale_color_manual(values=c("#696969", "#BC8F8F", "#DC143C", "#A0522D", "#CD5C5C", "#3B83BD")) +
  ggtitle('IAV treatment response differences for each population') +
  theme(legend.text = element_text(size = 14),  plot.title = element_text(size=15))
library(grid)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:Biobase':
## 
##     combine
## The following object is masked from 'package:BiocGenerics':
## 
##     combine
## The following object is masked from 'package:dplyr':
## 
##     combine
title <- textGrob("Differencies in gene expression as a response to treatment for African individuals in relation to European individuals", gp = gpar(fontsize = 20, fontface = "bold"), vjust = 0.5)
grid.arrange(grobs=list(v_PAM_t, v_R848_t, v_LPS_t, v_IAV_t), ncol=1, top=title)

2.1 Results in Pam3CSK4 treatment, TRL1 activation:

For European individuals positive calcium transport was a main difference, this could be the result of a mechanism that bacteria use to adhere the surface of the cell thanks to change in potential and a way to facilitate bacterial invasion (doi: 10.1007/978-3-030-12457-1_33). Immune response is a bigger difference in expression that benefit Europeans.

African individuals had more chances to adapt their bacterial response to this kind of bacteria by processing and translating RNA related to genes mediating the adaptation to bacteria process (doi: 10.1261/rna.073288.119). Cell migration was also another mechanism that was more expressed in African individuals, it could be a mechanism of protection form bacteria this is due to need for immune cells to move to the infection, it is also linked with inflammatory response that is found as a difference for some highly differentially expressed genes (doi:10.1111/j.1600-0765.1966.tb01852.x). Then African individuals presented higher expression in genes involved myeloid cell differentiation (diffferentation of immune cells) and peptidyl-tirosine phosphorylation that start signaling pathways that control cell proliferation, migration, and adhesion.

2.2 Results in R848 treatment, TRL7/8 activation:

For European individuals the biggest difference was the higher expression of genes involved in immune response and the response to other organisms. Lipid metabolic processes, specially catabolic processes were also significant, this could be due to the need of across membrane cell transport.

Translation in the other hand was the main difference for African individuals being activated in the response, this process can be enhaced in response to bacteria and viruses as I mentioned beforeto enhace the process of adaptation.

2.3 Results in LPS treatment, TRL4 activation:

The diference for European individuals was by far the immune/biological response to viruses and other organisms. There were some big differences on humoral immune response specifically and in the chemokine signaling pathway and cytokine production resulting in adaptive inflammatory host defenses, cell growth, differentiation, cell death, angiogenesis, and development and repair processes (https://www.genome.jp/pathway/hsa04062#:~:text=The%20chemokine%20signal%20is%20transduced,cellular%20polarization%20and%20actin%20reorganization).

Africans present more chances of produce a higher inflammatory response (higher expression) and to activate non coding RNA and ribosomal RNA processes, non coding RNA indeed regulates rRNA. Ribosomal RNA can be linked with inflammatory response (doi: 10.3390/cells11091440).

2.4 Results in Influenza A Virus treatment:

For African individuals the bigger difference in expression is held by response to cytokine (small proteins involved in cell signaling), immune response and other organisms response. Ubiquitin-dependent protein catabolic pathway, protein postraslational modification and endomembrane system organization are also positive differences in expression. This means that are the main differences in response to the virus. Ubiquitin-dependent protein catabolic pathway has an antiviral role (doi: 10.3390/microorganisms8091424), the endomembrane system organization seems to be altered by viruses.

For European individuals the immune response was a benefitial difference. Another processes that were linked with higher expression levels were plasma lipoprotein assembley, translation (could be induced by the virus), IL6 production (signaling), and ATP synthesis (seems to be a detrimental response directly caused by virus, doi: 10.1093/infdis/jiad442).